Goto

Collaborating Authors

 indicator function


On the VC dimension of deep group convolutional neural networks

Neural Information Processing Systems

Equivariant neural networks outperform traditional deep neural networks on a number of tasks. The theoretical understanding of their generalization properties remains, however, limited. In this paper, we analyze the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function through the lens of Vapnik-Chervonenkis (VC) dimension theory. By deriving upper and lower bounds, we investigate how the network architecture affects the VC dimension.


A Note on Non-Negative $L_1$-Approximating Polynomials

arXiv.org Machine Learning

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently found uses in smoothed learning from positive-only examples. In this short note, we prove that every class of sets with Gaussian surface area (GSA) at most $ฮ“$ under the standard Gaussian admits degree-$k$ non-negative polynomials that $\eps$-approximate its indicator functions in $L_1$-norm, for $k=\tilde{O}(ฮ“^2/\varepsilon^2)$. Equivalently, finite GSA implies $L_1$-approximation with the stronger pointwise guarantee that the approximating polynomial has range contained in $[0,\infty)$. Up to a constant-factor, this matches the degree of the best currently known Gaussian $L_1$-approximation degree bound without the non-negativity constraint.


Appendices

Neural Information Processing Systems

When e 6 Wฮฆ, we have E = Rd and Wฮฆ,E = Wฮฆ. By Theorem 1 in [10], we know that the projected Bellman equation (3.4) has a unique fixed point ฮธ . Thus, L= {ฮธ }. 2. When e Wฮฆ, ฮธe is a unique solution to ฮฆฮธ = eas ฮฆ is full column rank. We first show that the set of solutions to the projected Bellman equation (3.4) takes the form { ฮธ+ cฮธe|c R}, where ฮธis any solution to (3.4). On the other hand, suppose that ฮธis not of the form ฮธ+ cฮธe.






AutoPrune: AutomaticNetworkPruningby RegularizingAuxiliaryParameters

Neural Information Processing Systems

Tobuildabettergeneralized and easy-to-use pruning method, we propose AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights.


Batched Thompson Sampling

Neural Information Processing Systems

O (log log(T)) expected batch complexity. This is achieved through a dynamic batching strategy, which uses the agents estimates to adaptively increase the batch duration.